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Abstract 

The mass composition of high energy cosmic rays above 10 17 eV is a crucial issue 
to solve some open questions in astrophysics such as the acceleration and propaga- 
tion mechanisms. Unfortunately, the standard procedures to identify the primary 
particle of a cosmic ray shower have low efficiency mainly due to large fluctuations 
and limited experimental observables. We present a statistical method for compo- 
sition studies based on several measurable features of the longitudinal development 
of the CR shower such as N max , X max , asymmetry, skewness and kurtosis. Principal 
component analysis (PCA) was used to evaluate the relevance of each parameter in 
the representation of the overall shower features and a linear discriminant analysis 
(LDA) was used to combine the different parameters to maximize the discrimina- 
tion between different particle showers. The new parameter from LDA provides a 
separation between primary gammas, proton and iron nuclei better than the pro- 
cedures based on X rnax only. The method proposed here was successfully tested in 
the energy range from 10 17 to 10 20 eV even when limitations of shower track length 
were included in order to simulate the field of view of fluorescence telescopes. 
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1 Introduction 



The most important questions about cosmic rays with energy above 10 17 eV 
can only be solved if the mass composition spectra is known. The complete 
understanding of the energy spectra and its features is only possible with the 
knowledge of the mass composition since it stores information about the source 
and the acceleration mechanisms. The evolution of the energy spectrum and 
the explanation of its possible features (2nd knee, ankle and the GZK cutoff) 
are intrinsically linked to the primary composition because the source power 
(in a Fermi mechanism) is directly proportional to the particle charge. Fur- 
thermore, the attenuation length of the particles due to the various energy loss 
mechanisms depends on the particle type and thus different mass composition 
can modulate the energy spectrum up to the highest energies. 

Present measurements with different techniques are not conclusive about the 
mass spectrum and its evolution with energy. Currently, composition studies 
have been done using fluorescence telescopes and ground arrays. Fluorescence 
telescopes are sensitive to the primary composition by measuring the depth in 
which the shower has the maximum number of particles (X max ) while ground 
based experiments usually extract the composition information from the muon 
density or the temporal structure of the signal. However, due to the large 
fluctuations in the shower development, the limited number of measurable 
parameters and the uncertainties in the hadronic interaction models the 
primary particle identification is very difficult and not possible on a event by 
event basis. 

Several experiments have published mass composition studies. The Haverah 
Park experiment has reported an iron fraction around 66% in the energy range 
0.2-1 EeV [I] and similar results have been measured by the Akeno [5] exper- 
iment. At energies above 10 19 eV, the AGASA experiment |6] measured an 
upper limit of the iron fraction of 35% in the range 10 19 — 10 195 eV and 76% 
in the range 10 19 - 5 - 10 20 eV. On the other hand, the HiRes Collaboration [7], 
measured an unchanging light composition above 10 18 eV and a change from 
heavy to light composition in the range 10 17 — 10 18 eV. Thus, the composition 
measurement for energies above 10 19 ' 5 eV are still inconclusive [8]. 

In this paper, we present the application of a statistical method to differentiate 
the primary particle based on several features of the longitudinal development 
of the CR shower rather than using only the depth of shower maximum (X max ). 
The longitudinal development of the CR shower can be well measured by 
fluorescence telescopes in operation by the HiRes and Auger Collaborations 
[9]. Our results are based on Monte Carlo simulations of the showers. 

This article is organized as follows: section [2] describes the longitudinal pro- 
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file and the X max parameter that is commonly used for composition studies. 
Section [3] describes the additional shower parameters proposed in this paper 
and the principal component analysis (PCA) used to evaluate them. Section 
[4] describes the linear discriminant analysis (LDA) and the results obtained 
using the newly defined parameters. Section [5] shows the results of the new 
method when the field of view of fluorescence telescopes is taken into account. 
A discussion of our results and future perspectives are summarized in the final 
section. 



2 Shower longitudinal profile and the X max parameter 

We have simulated air showers using the CORSIKA version 6.203 [10] Monte 
Carlo program with the QGSJET 01 [11J hadronic interaction model. Parti- 
cles in the shower were followed down to the energy of 0.05 GeV (muon and 
hadrons) and 50 keV (photons and electrons). The longitudinal development 
of the showers was sampled in vertical steps of 5 g/cm 2 . The thinning algo- 
rithm [T2"fT3"] was used with a thinning factor of 10~ 4 and a maximum weight 
of 10 5 . Primary photons have been simulated with the pre-shower [13] effect 
and . Two thousand shower were simulated for each type of primary particle 
and for each energy ranging from 10 17 eV to 10 20 eV. 

Figure [T] shows the simulated longitudinal profile of iron and proton show- 
ers. Shower profiles shown in this figure were shifted by its X max for better 
comparison of the profile shapes. 

Composition studies done by fluorescence telescopes are usually based on this 
X m ax parameter. Figure [2] shows the X max distribution for primary protons, 
iron nuclei and photons for showers simulated with primary energy of 10 18 eV. 
This figure illustrates that the X max parameter can be a good discriminator 
for gamma showers but its capability to differentiate protons from iron nuclei 
primary particles is quite limited due to the large overlap between the two 
distributions. In the energy range between 10 17 eV to 10 20 eV the discrimina- 
tion capability of X max varies as shown in figure [3} The elongation rate shows 
how the average X max varies with energy and is normally used to report the 
evolution of the composition with energy. Figure [3] also shows the RMS of the 
X m ax distribution as the error bars and the clear overlap of the X ma x param- 
eter for different primary particles indicates that the discrimination between 
the different particles becomes more difficult for higher energies. 

To quantify the separation capability hence the discrimination between the 
different primary particle distributions, we are going to use the merit factor 
(MF) statistical parameter. The merit factor between two distributions (A 
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and B) is denned as: 

MF = f~ B , (1) 



where A and B are the distributions averages, and a a and Ob the respective 
standard deviations. 



Figure |4^i shows the merit factor fluctuation as a function of the number of 
events. We have used the X max distribution of primary iron nuclei and protons 
as shown in figure [2] for this calculation. Random values were drawn follow- 
ing those distribution 1000 times and for each time a merit factor value was 
calculated. Following this procedure the fluctuation of the merit factor as a 
function of the number of showers could be calculated. Figure [4^ shows that 
the results presented here using the merit factor parameter have uncertainties 
smaller than 3% since we are always comparing two distributions with 2000 
events each, summing 4000 total events. We have also studied how the un- 
certainty varies with the fraction of iron nuclei in the total number of events. 
Figure [ijo shows that for any mixture of iron and proton the merit factor 
uncertainty is smaller than 3%. 

The merit factor achieved for the separation between proton and iron and 
proton and photon using the X max distributions is 1.20 and 1.38, respectively. 
These values are going to be compared to the merit factor obtained from the 
new analysis proposed here. 



3 New parameters and the principal component analysis 



Comparing the shapes of the longitudinal profile shown in figure [TJ it is possible 
to notice that there are differences between the two different primary parti- 
cles. Thus, we have looked for mathematical parameters that express these 
differences to improve the discrimination capacity between the particles. 

Several new parameters were tested and only some have shown good sepa- 
ration capabilities and the most powerful ones seem to be those related to 
the asymmetry of the longitudinal profiles. Below we list some tested param- 
eters and show its discrimination capability by quoting the merit factor (MF) 
between proton and iron distributions. 

X max :the atmospheric depth (g/cm 2 ) in which the shower has the maximum 
number of particles. It is the most used composition parameter and is cal- 
culated in any analysis procedure of fluorescence telescopes data. A indirect 
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measurement of the X max can also be done by water Cerenkov detectors via 
the rise time method [15]. MF = 1.20. 



N max : the number of particles in the shower at a X max . This is also a standard 
parameter calculated in any fluorescence telescope analysis and it is directly 
proportional to the shower energy. If the error in the energy reconstruction is 
large the inclusion of this parameter in the composition study would lead to a 
dependence with energy that is hard to disentangle. However, the fluorescence 
telescopes reconstruct the energy with an error of about 15% what we believe is 
a safe margin since variations in the energy of an EAS of this order do not affect 
the hypothesis about the chemical composition of the primary. MF = 1.00 



Asymmetry and Sigma: in order to measure the asymmetry of the distribution 
we have fit an asymmetric function to the longitudinal profile defined as: 

if ( X < X max ) 

Npart = N max exp Sigr ^2 

if ( X > X max ) 

jV — jV gXD X-Xmax 

part max f Asymmetry 2 * Sigma 2 

X max and N max are fixed in the fit. Asymmetry and sigma are the only two 
variables allowed to vary in the fit procedure. The asymmetry variable is a 
direct measure of the difference between the parts of the shower below and 
above X max . Sigma gives a measure of the width of the shower. Asymmetry: 
MF = 1.08 and Sigma: MF = 1.07. 



Skewness: is the third momentum of the distribution and is also a measurement 
of the asymmetry of the longitudinal distribution. MF = 2.13 



Kurtosis: is the fourth momentum of the distribution and is a combined mea- 
surement of the size of the peak and the tails. MF = 1.69 

Other parameters were tested and rejected. For instance, we have fit a linear 
function in the range between [X max — 400,X max — 100] and a second linear 
function in the range between [X max + 100,X max + 400]. The slope of these 
functions were taken as a measurement of the increase and decrease rate of 
the number of particles in the shower. However, no combination of the slopes 
has shown any capability to separate proton from iron showers. 

The distribution of the six parameters used in the composition studies pro- 
posed here are shown in figure [5] for proton and iron nuclei primaries. 
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Principal component analysis (PCA) was used to study the relevance of each 
parameter to the determination of the shower overall features. PCA is a linear 
transformation that rewrites a dataset into a new set of variables that are 
uncorrelated. These new variables, called principal components, are ordered 
by the amount of variance [16J. 

PCA has wide applications in astrophysics in analysis of spectra and galaxy 
surveys [TT][T8] and in analysis of the Gamma Ray Bursts' spectra [19], pa- 
rameter reduction in Dark Energy studies J2U], finding systematic differences 
between stellar populations |21| . among others. In cosmic ray research, it has 
been used as a method to distinguish photon generated showers from hadronic 
showers applied to the secondary particle distributions at ground level pro- 
duced by photons and protons showers [22]. In another work, PCA was used 
in the data analysis of Cerenkov Gamma Ray Telescopes [23J. 

We have applied PCA for proton and iron showers and defined the eigenvectors 
which characterizes their longitudinal profile. The first principal component is 
the eigenvector with the largest variance, related with the largest eigenvalue, 
the second principal component is associated with the second largest eigen- 
value and so on. Figure [6^1 shows the contribution of each parameter to the 
principal component parameters when applied to the proton initiated show- 
ers. Figure |6)d corresponds to the equivalent plot for the iron initiated showers. 
The weights shown in the figures were normalized to 100%. It is interesting 
to note that the relative weight of the new parameters such as the skewness 
and the kurtosis is considerably different between the two data sets. Also, by 
comparing the variance of the principal components distributions from the two 
data sets, we have noted that some of the principal components show different 
profiles, indicating that indeed there are clear differences between the relative 
weights of the featured parameters to the overall shower determination. To 
take advantage of these differences for discriminating the two different data 
sets, we have applied a statistical method known as the linear discrimination 
analysis. 



4 Linear discriminant analysis 

Linear discriminant analysis (LDA) was used to search for the best combina- 
tion of parameters to separate proton from iron nuclei showers and photon 
from hadronic primaries. LDA is a statistical discrimination method used to 
find a function of linear combinations of variables that maximizes the separa- 
tion between two or more classes of objects or events. It accomplishes that by 
maximizing the ratio of the variability between different groups, determined 
through the pooled (overall) covariance matrix, to the variability within each 
group (covariance matrix of each class) [24J. 
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In our analysis all six parameters presented in figure [5] were included in the 
LDA calculation. Training datasets for proton and iron showers were used to 
determine a set of discriminant coefficient vectors. These discriminant coef- 
ficient vectors are then used to calculate the LDA variables fl and f'2 for 
each new data point, that was not in the original training dataset. The differ- 
ence between the two LDA parameters (fl — f2) is expected to yield the best 
separation between the two populations. Applying this method to the pro- 
ton and iron populations, we have obtained a merit factor of 2.59. Including 
showers originated by photons in the same analysis keeping the discriminant 
coefficients that were calculated for proton and iron a separation merit fac- 
tor of 2.03 was achieved between photons and protons [7^l. To improve the 
photon-hadron discrimination, we repeated the procedure but using as train- 
ing datasets one group of hadrons (50%proton and 50% iron showers) and 
another group of photons induced showers in which merit factor of 

3.36 was achieved. It is worthwhile to emphasize that X max alone has a merit 
factor equal to 1.20 for proton and iron showers and a merit factor of 1.38 
between proton and photon distributions. 

The capability of the LDA parameters was tested as a function of energy, where 
we have recalculated the dependent parameters coefficient for each energy. 
Figure [8] shows the evolution of the LDA diagonal projection with energy 
calculated for proton and iron nuclei. This figure can be compared to the X max 
elongation rate (see figure [3]). Figure [3] shows a overlap of the X max distribution 
along the entire energy range while figure [8] shows a clear separation of the 
LDA parameter along the same energy range. 

A comparison of both methods as a function of energy can be seen in fig- 
ure [9] where we show the evolution of the merit factor for the LDA diagonal 
projection and for X max with energy. 



5 Simulation of the telescopes' field of view 

In order to test the effectiveness of the new analysis in a real situation with 
showers detected by fluorescence telescope we have reduced the track length of 
the showers used in our analysis and recalculated the discrimination capability. 
We have also repeated the same LDA with a reduced part of the longitudinal 
shower profile. Showers were limited to a maximum length ranging from 400 
g/cm 2 to 2000 g/cm 2 around its X max . 

The method presented here is based on the properties of the longitudinal 
profile and consequently the limited field of view of fluorescence telescopes 
is the main detection feature that affects the quality of the method. Detec- 
tor fluctuations and reconstruction uncertainties are expected to be a minor 
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contribution to degrade the method capabilities. However, it is clear that the 
exact discrimination values of the new parameters must be calculated for each 
specific telescope configuration and analysis procedure. 



Figure 10 shows the merit factor of each parameter as a function of the total 
track length of the shower considered in our analysis. Calculations were done 
for showers with primary energy equals to 10 18 eV but similar results were seen 
for all energies. As expected, kurtosis and skewness are efficient only when a 
large fraction of the shower is detected due to their dependence to the tails of 
the longitudinal distributions. Asymmetry and sigma remain good parameters 
even when the shower has 1000 g/cm 2 , but it also degrades very fast for lower 
track length. Also shown are the merit factor calculated using LDA, and which 
remained above 2 for all the entire range considered, showing that it yields 
better discrimination than considering each parameter independently. 



6 Conclusion 



We studied different features of the cosmic ray shower longitudinal profile aim- 
ing to determine a better set of parameters that can improve primary particle 
identification. The relevance of each parameter to the overall profile was stud- 
ied using a principal component analysis which showed that the relative weight 
of these additional parameters proposed in this analysis changed depending 
on the shower type. 

The discrimination capability of the different parameters was compared using 
a merit factor that measures how separate are the parameter distributions of 
the two different data sets. When evaluating the complete shower profile, we 
have determined that some of the proposed parameters such as the distribu- 
tion skewness and kurtosis show better discrimination capability than X max 
that is commonly used for composition analysis. However, these parameters 
loose very quickly its discrimination power when the track length of the shower 
is reduced. To combine the separation capability of all the parameters the sta- 
tistical method linear discrimination analysis was applied resulting in a new 
parameter with much better separation efficiency. With LDA, we were able to 
get separation merit factors above 2 sigma, between proton and iron initiated 
showers. For the separation between photon and hadron (proton+iron) show- 
ers, LDA yields a separation merit factor of 3.3 sigma for shower energy of 
10 18 eV. Comparing to the composition analysis using the standard X max , the 
LDA yields a better separation efficiency, even when considering incomplete 
shower profiles. It is clear that to have a better estimate of the applicability 
of this method to real data, a more complete simulation analysis considering 
a complete simulation of the detector response and atmospheric effects have 
to be considered. 
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In conclusion, we have shown that it is possible to improve composition studies 
of cosmic ray showers by considering a more complete set of variables that 
describe better the shower longitudinal profile. Further analysis with more 
complex and sophisticated statistical separation methods such as hierarchical 
clustering methods and neural network analysis can also further enhance this 
analysis. 
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Fig. 1. Longitudinal development of proton and iron nuclei showers. Each profile 
was shifted by its X max depth. Figure shows 100 proton and 100 iron nuclei showers. 




Fig. 2. Distribution of X max for proton, iron nuclei and photon primaries. 
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Fig. 3. Simulated elongation rate for proton, iron nuclei and photon primaries. Error 
bars represent the RMS of the X max distribution. 




Fig. 4. (a) Error dependence of the merit factor with the number of events (b) 
and merit factor error dependence on the relative fraction of events between proton 
showers and iron showers. 
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Fig. 5. Distribution the X max , N max , sigma, asymmetry,skewness and kurtosis for 
proton and iron nuclei shower with primary energy Eq = 10 18 eV. 
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Fig. 6. Weight of the parameters X max , N max , asymmetry, sigma, skewness and 
kurtosis in the PC A defined parameters fl, f2, f3, f4, f5 and f6 for proton (a) and 
iron(b) shower with energy 1 EeV. 




LDA (f 1 -f2) LDA(f1-f2) 

Fig. 7. Distribution of LDA parameters (/l — /2) calculated for proton, iron and 
photon showers, using protons and irons as LDA training datasets in the left plot 
and using photon and hadrons as LDA training datasets in the right plot. 
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Fig. 8. LDA parameter mean for protons and iron showers as a function of energy. 
Error bars represent the standard deviation of the distribution at each energy. 





° MFofXmax 




• MF of LDA proj. 


■Ii. 


• 

• 




■ 


I 1 1 | 






o o ° 


~ 








16.5 17 17.5 18 18.5 19 19.5 20 20.5 



log10(E o (eV)) 

Fig. 9. Merit factor between proton and iron distributions for the LDA parameter 
and X max as a function of energy. 
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Fig. 10. Merit factor between proton and iron distributions for shower parameters, 
sigma, asymmetry, skewness and kurtosis compared to the MF obtained from LDA 
as a function of profile slant depth. 
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